Using Kart and GitHub for versioning and collaborating with spatial data in archaeological research

Archeo.FOSS 17 (Turin, 12-13 December)

Andrea Titolo

University of Turin

Alessio Palmisano

University of Turin

Talk overview

  • Introduction
    • Open Science and version control in archaeology
    • Git and limitation
  • Git for geospatial data
    • Description and features of Kart
  • Practical applications of kart in archaeology
    • How we are using kart
    • Limitations of kart
  • Thoughts and conclusions

Introduction

1 2 3 4

Open Science and transparency of the process

1 2 3 4

  • One (of many) aim Open Science: opening and transparency of process behind data creation and results
  • “Data must have history” Strupler and Wilkinson (2017)

Wallis (2022)

Version control

1 2 3 4

  • Transparent process trough “snapshots” at different stages
  • Easy roll-back to previous versions if something goes wrong
  • Provides a solution to the multiple iterations of correction and renaming of the same file
  • Greater accountability and better documentation (Kansa 2012)
  • Enhances Open Science practices (Marwick 2017)

Source: xkcd

Git

1 2 3 4

  • Distributed version control system
  • Originally developed to track changes in the linux kernel
  • Adapted also to non-coding environment
  • Git is still not a user friendly software
  • Graphical frontends do not always help

Source: xkcd

Distributed version control and archaeology

1 2 3 4

  • Archaeology has come a long way in adopting version control
  • Used mainly in the programming/scripting applications and publication
  • Some attempts to adapt it to fieldwork practices


Source: Strupler and Wilkinson (2017: 5)

Source: Strupler and Wilkinson (2017: 4)

Git and binary files

1 2 3 4

  • Binary files: images, word documents, excel files
  • Git is not as efficent with binary files as it is with plain text (save the entire file every time)
  • Storage issues, harder to track changes
  • For text files, plain text can sometimes be the answer, but what about GIS and relational databases?


What about geospatial data?

1 2 3 4

  • In GIS, research process is often obscured by the point-and-click nature of the GUI
  • QGIS models can surely help reproducibility of some analyses
  • Scripts for data cleaning

For many in archaeology, for whom using GIS to visualise results is essentially a graphical-based point and-click process, advocating a return to code may seem like a backward step. We understand the arguments for usability, and acknowledge that intermediate tools which can bridge point-and-click with code-based approaches are desperately required.

Strupler and Wilkinson (2017)

Git for geospatial data

1 2 3 4

Git for distributed version control of geospatial data

1 2 3 4

  • A distributed version control system for geospatial and tabular data
  • Cross-platform
  • FOSS
  • GPL (v2) License
  • by Koordinates in 2020
  • More info: Coup (2023: FOSS4G), Coup (2022a: FOSS4G:UK), Olaya (2022, QGISDay), and Coup (2022b, PostGISDay) presentations


Kart features

1 2 3 4

  • Works with different file formats: Geopackage, PostgreSQL/PostGIS, MySQL, MSSQLS
  • Support most geospatial data types: Vectors, Raster, Point Clouds, Lidar, etc.
  • Planned support for shapefiles
  • “Built on git, works like git”
  • Own version of git and git large file storage
  • No need to have git installed

Kart features

1 2 3 4

  • Track changes at the row and cell layer level
  • Command Line Interface tool
  • Standard git workflow
    • kart status
    • kart add
    • kart commit
    • kart pull
    • kart push
    • kart log
    • kart switch/branch
  • Scriptable

Kart QGIS Plugin

1 2 3 4

  • QGIS plugin offers a Graphical User Interface
  • All the kart commands are available
  • Visual tool to inspect changes

Remote Collaboration

1 2 3 4

  • Host data in remote repositories
  • Compatible with all qgis styles
  • Potential to mitigate issues regarding data sharing

Kart way of storing data

1 2 3 4

  • Data are broken down into SQL-like model of tabular structure
  • Visible in the remote repository, not in the working copy (local folder)
  • The geopackage (or any format) is not present on the kart remote repo


Kart for archaeology

1 2 3 4

Kart for archaeology

1 2 3 4

  • Fieldwork (no need internet connection unless you push changes to remote)
    • Remote repository can also be another folder
  • Desk-based work
    • Collaboration inside projects
  • Uphold Open Science practices

By Ainsley Seago (2014) CC BY 4.0

Project presentation

1 2 3 4

  • The Governance Policies and Political Landscapes in the Southern Levant under the Neo-Assyrian Empire
  • Holistic approach integrating archaeological, textual and geographical data into a spatial framework.
  • Understand Neo-Assyrian imperial strategies, local regional responses and their effects on population and settlement patterns in the Southern Levant region
  • Collection and processing of geospatial data
  • Remote collaboration and avoid back-and-forth of different versions of multiple files
  • Follow Open Science practices from the start

Project structure

1 2 3 4

  • Organization on GitHub
  • Project actions treated as GitHub issues
  • Different repositories depending on data
  • Granular control of licenses, publications, repo access

Using Kart in our project

1 2 3 4

  • Relatively simple workflow
  • Two main uses
  • Collaboration between project members
    • Simple git workflow
    • Different branches for each person, pushing and merging to main
  • Keeping track of dataset change
    • Transparency of the process
    • File (and methods) history
    • Inspect beyond the final product

Using Kart in our project - issues

1 2 3 4

  • Not many issues until now (few people)
  • Collaboration tested on two MacOS (13-Ventura and 12-Monterey), issues with MacOS 11-Big Sur
  • Kart tested also on Ubuntu-based Linux (Pop!_OS)
  • Conflicts with primary keys when working with Geopackage

Using Kart in our project

1 2 3 4

  • Public project wiki
  • How to use the dataset and how to use kart
  • Tips to solve common issues
  • Methodology and convetions
  • Internal use and external reference
  • Updated as the project proceed

Conclusions

1 2 3 4

Conclusions

1 2 3 4

Advantages

  • Git-based tool
  • Graphical solution for those unfamiliar with git
  • Kart can fit well into archaeological Open Science practices
  • More transparency both during and after data creation process
  • Lack single file to download from online repositories (site stewardship)

Disadvantages

  • Not an easily accessible tool
  • Graphical interface still need more work
  • Solving primary key conflicts requires the command-line
  • Documentation is still catching up with recent development
    • Contribution to upstream from our wiki

Thank you!

Andrea Titolo (andrea.titolo@unito.it)

Alessio Palmisano (alessio.palmisano@unito.it)


Slides Source Code


Works Cited